40        Bioinformatics

The sequence length distribution warning also can be raised by clipping ­adaptors or

­overrepresented sequences. Thus, we can use “fastx_clipper” first to remove the overrep-

resented sequences (see Figure 1.31). The following script removes a contaminating over-

represented sequence:

fastx_clipper \

-a ATCGGGAGAGGGGCGGGGAGGGGAAGAGGGGAGAATTCGGGGGGGGCCGG \

-i bad_filt_trim.fastq \

-o bad_filt_trim_clip.fastq \

-v \

-Q33

fastqc bad_filt_trim_clip.fastq

htmlfiles=$(ls *.html)

firefox $htmlfiles

Since some aligners in the next step of analysis may not accept sequences with unequal

lengths, we can use a bash script to filter out the short reads. Figure 1.34 shows sequence

length distribution. If the aligner that we intend to use does not accept unequal read

lengths, then we can filter out all reads whose length is less than 150 bases using the fol-

lowing script:

FIGURE 1.34  Sequence length distribution (different lengths).